57 research outputs found
Methodological Issues in Multistage Genome-Wide Association Studies
Because of the high cost of commercial genotyping chip technologies, many
investigations have used a two-stage design for genome-wide association
studies, using part of the sample for an initial discovery of ``promising''
SNPs at a less stringent significance level and the remainder in a joint
analysis of just these SNPs using custom genotyping. Typical cost savings of
about 50% are possible with this design to obtain comparable levels of overall
type I error and power by using about half the sample for stage I and carrying
about 0.1% of SNPs forward to the second stage, the optimal design depending
primarily upon the ratio of costs per genotype for stages I and II. However,
with the rapidly declining costs of the commercial panels, the generally low
observed ORs of current studies, and many studies aiming to test multiple
hypotheses and multiple endpoints, many investigators are abandoning the
two-stage design in favor of simply genotyping all available subjects using a
standard high-density panel. Concern is sometimes raised about the absence of a
``replication'' panel in this approach, as required by some high-profile
journals, but it must be appreciated that the two-stage design is not a
discovery/replication design but simply a more efficient design for discovery
using a joint analysis of the data from both stages. Once a subset of
highly-significant associations has been discovered, a truly independent
``exact replication'' study is needed in a similar population of the same
promising SNPs using similar methods.Comment: Published in at http://dx.doi.org/10.1214/09-STS288 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Comparison of family-based association tests in chromosome regions selected by linkage-based confidence intervals
We use the Genetic Analysis Workshop 14 simulated data to explore the effectiveness of a two-stage strategy for mapping complex disease loci consisting of an initial genome scan with confidence interval construction for gene location, followed by fine mapping with family-based tests of association on a dense set of single-nucleotide polymorphisms. We considered four types of intervals: the 1-LOD interval, a basic percentile bootstrap confidence interval based on the position of the maximum Zlr score, and asymptotic and bootstrap confidence intervals based on a generalized estimating equations method. For fine mapping we considered two family-based tests of association: a test based on a likelihood ratio statistic and a transmission-disequilibrium-type test implemented in the software FBAT. In two of the simulation replicates, we found that the bootstrap confidence intervals based on the peak Zlr and the 1-LOD support interval always contained the true disease loci and that the likelihood ratio test provided further strong confirmatory evidence of the presence of disease loci in these regions
Recommended from our members
Genetic Variation in the Base Excision Repair Pathway, Environmental Risk Factors, and Colorectal Adenoma Risk
Cigarette smoking, high alcohol intake, and low dietary folate levels are risk factors for colorectal adenomas. Oxidative damage caused by these three factors can be repaired through the base excision repair pathway (BER). We hypothesized that genetic variation in BER might modify colorectal adenoma risk. In a sigmoidoscopy-based study, we examined associations between 182 haplotype tagging SNPs in 14 BER genes, and colorectal adenoma risk, and examined their potential role as modifiers of the effect cigarette smoking, alcohol intake, and dietary folate levels. Among all individuals, no statistically significant associations between BER SNPs and adenoma risk persisted after correction for multiple comparisons. However, among Asian-Pacific Islanders we observed two SNPs in FEN1 and one in NTHL1, and among African-Americans one SNP in APEX1 that were associated with colorectal adenoma risk. Significant associations were also observed between SNPs in the NEIL2 gene and rectal adenoma risk. Three SNPS modified the effect of smoking (MUTYH interaction p = 0.002; OGG1 interaction p = 0.013); FEN1 interaction p = 0.013)), one SNP in LIG3 modified the effect of alcohol consumption (interaction p = 0.024) and two SNPs in LIG3 modified the effect of dietary folate (interaction p = 0.001 and p = 0.08) on colorectal adenoma risk. These findings support a role for genetic variants in the BER pathway as potential modifiers of colorectal adenoma risk. Our findings strengthen the role of oxidative damage induced by key lifestyle and dietary risk factors in colorectal adenoma formation
Recommended from our members
Tobacco smoking, polymorphisms in carcinogen metabolism enzyme genes, and risk of localized and advanced prostate cancer: results from the California Collaborative Prostate Cancer Study
The relationship between tobacco smoking and prostate cancer (PCa) remains inconclusive. This study examined the association between tobacco smoking and PCa risk taking into account polymorphisms in carcinogen metabolism enzyme genes as possible effect modifiers (9 polymorphisms and 1 predicted phenotype from metabolism enzyme genes). The study included cases (n = 761 localized; n = 1199 advanced) and controls (n = 1139) from the multiethnic California Collaborative Case–Control Study of Prostate Cancer. Multivariable conditional logistic regression was performed to evaluate the association between tobacco smoking variables and risk of localized and advanced PCa risk. Being a former smoker, regardless of time of quit smoking, was associated with an increased risk of localized PCa (odds ratio [OR] = 1.3; 95% confidence interval [CI] = 1.0–1.6). Among non-Hispanic Whites, ever smoking was associated with an increased risk of localized PCa (OR = 1.5; 95% CI = 1.1–2.1), whereas current smoking was associated with risk of advanced PCa (OR = 1.4; 95% CI = 1.0–1.9). However, no associations were observed between smoking intensity, duration or pack-year variables, and advanced PCa. No statistically significant trends were seen among Hispanics or African-Americans. The relationship between smoking status and PCa risk was modified by the CYP1A2 rs7662551 polymorphism (P-interaction = 0.008). In conclusion, tobacco smoking was associated with risk of PCa, primarily localized disease among non-Hispanic Whites. This association was modified by a genetic variant in CYP1A2, thus supporting a role for tobacco carcinogens in PCa risk
The association of polymorphisms in hormone metabolism pathway genes, menopausal hormone therapy, and breast cancer risk: a nested case-control study in the California Teachers Study cohort
Abstract Introduction The female sex steroids estrogen and progesterone are important in breast cancer etiology. It therefore seems plausible that variation in genes involved in metabolism of these hormones may affect breast cancer risk, and that these associations may vary depending on menopausal status and use of hormone therapy. Methods We conducted a nested case-control study of breast cancer in the California Teachers Study cohort. We analyzed 317 tagging single nucleotide polymorphisms (SNPs) in 24 hormone pathway genes in 2746 non-Hispanic white women: 1351 cases and 1395 controls. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated by fitting conditional logistic regression models using all women or subgroups of women defined by menopausal status and hormone therapy use. P values were adjusted for multiple correlated tests (P ACT). Results The strongest associations were observed for SNPs in SLCO1B1, a solute carrier organic anion transporter gene, which transports estradiol-17β-glucuronide and estrone-3-sulfate from the blood into hepatocytes. Ten of 38 tagging SNPs of SLCO1B1 showed significant associations with postmenopausal breast cancer risk; 5 SNPs (rs11045777, rs11045773, rs16923519, rs4149057, rs11045884) remained statistically significant after adjusting for multiple testing within this gene (P ACT = 0.019-0.046). In postmenopausal women who were using combined estrogen-progestin therapy (EPT) at cohort enrollment, the OR of breast cancer was 2.31 (95% CI = 1.47-3.62) per minor allele of rs4149013 in SLCO1B1 (P = 0.0003; within-gene P ACT = 0.002; overall P ACT = 0.023). SNPs in other hormone pathway genes evaluated in this study were not associated with breast cancer risk in premenopausal or postmenopausal women. Conclusions We found evidence that genetic variation in SLCO1B1 is associated with breast cancer risk in postmenopausal women, particularly among those using EPT
A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
Colorectal cancer risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Geneenvironment interactions (G x E) can provide biological insights into the effects of obesity on colorectal cancer risk. Here, we assessed potential genome-wide G x E interactions between body mass index (BMI) and common SNPs for colorectal cancer risk using data from 36,415 colorectal cancer cases and 48,451 controls from three international colorectal cancer consortia (CCFR, CORECT, and GECCO). The G x E tests included the conventional logistic regression using multiplicative terms (one degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G x E interactions under specific conditions. BMI was associated with higher colorectal cancer risk. The two-step approach revealed a statistically significant GxBMI interaction located within the Formin 1/Gremlin 1 (FMN1/GREM1) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher colorectal cancer risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the FMN1/GREM1 gene region that interacts with BMI on the association with colorectal cancer risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and colorectal cancer
Data integration by multi-tuning parameter elastic net regression
Abstract Background To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally. Results We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets. Conclusion The proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms
PEACOCK: a machine learning approach to assess the validity of cell type-specific enhancer-gene regulatory relationships
Abstract The vast majority of disease-associated variants identified in genome-wide association studies map to enhancers, powerful regulatory elements which orchestrate the recruitment of transcriptional complexes to their target genes’ promoters to upregulate transcription in a cell type- and timing-dependent manner. These variants have implicated thousands of enhancers in many common genetic diseases, including nearly all cancers. However, the etiology of most of these diseases remains unknown because the regulatory target genes of the vast majority of enhancers are unknown. Thus, identifying the target genes of as many enhancers as possible is crucial for learning how enhancer regulatory activities function and contribute to disease. Based on experimental results curated from scientific publications coupled with machine learning methods, we developed a cell type-specific score predictive of an enhancer targeting a gene. We computed the score genome-wide for every possible cis enhancer-gene pair and validated its predictive ability in four widely used cell lines. Using a pooled final model trained across multiple cell types, all possible gene-enhancer regulatory links in cis (~17 M) were scored and added to the publicly available PEREGRINE database ( www.peregrineproj.org ). These scores provide a quantitative framework for the enhancer-gene regulatory prediction that can be incorporated into downstream statistical analyses
- …